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UNCERTAINTIES  ON  NETWORKS  (PI:  N.  Cressie) 


Objectives 

Networks  as  models  can  be  found  in  many  disciplines,  including  biology,  computer  science,  engi¬ 
neering,  geography,  mathematics,  physics,  sociology,  and  statistics.  However,  there  are  uncertain¬ 
ties  associated  with  imperfect  knowledge  of  a  network’s  nodes  and  dependencies,  as  well  as  with 
noise-corrupted  variables  defined  on  the  network.  Networks  have  become  important  components  of 
complex  representations  of  reality  and,  when  built  into  a  hierarchical  statistical  modeling  structure, 
they  allow  partitioning  of  joint  probability  distributions  that  seem  unmanageable  at  first  glance. 
Thus,  a  statistical  approach  to  network  analysis  is  natural  from  both  a  probabilistic  and  an  in¬ 
ferential  point  of  view.  In  this  research,  we  study  spatial  and  spatio-temporal  networks  through 
graph  theory  (e.g.,  Lauritzen,  1996;  Cressie  and  Davidson,  1998).  A  chain  graph  is  defined  to  be  a 
combination  of  undirected  graphs  and  acyclic  directed  graphs  (ADGs),  with  the  overall  structure 
being  guided  by  an  ADG.  The  undirected  parts  account  for  the  spatial  dependence,  the  directed 
parts  can  be  used  to  account  for  the  temporal  dependence,  and  the  guiding  ADG  captures  the 
spatio-temporal  interactions. 

Impact 

Models  in  space  and  space-time  are  essential  for  representing  the  battlespace.  For  example,  they 
are  used  in  estimating  a  dynamically  evolving  danger  function  or  in  predicting  a  waypoint  in  the 
presence  of  uncertainties.  In  netcentric  warfare,  the  uncertainties  reside  not  only  in  the  variables 
at  the  network’s  nodes,  but  also  in  the  presence  or  absence  of  network  nodes  and  the  dependencies 
between  the  nodes.  This  occurs  when  a  node  may  only  be  operational  intermittently  or  when  the 
enemy’s  network  is  unknown,  apart  from  a  few  obvious  nodes.  In  this  research,  we  incorporate 
spatial  and  spatio-temporal  dependencies  into  the  analysis  of  network  data. 

References 
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UNCERTAINTIES  ON  NETWORKS  (PI:  N.  Cressie) 
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Technical  Summary 

Recently,  a  great  amount  of  attention  has  been  paid  to  random  networks,  which  are  widely  used 
to  represent  complex  relationships  in  many  areas  (e.g.,  World  Wide  Web  communications,  social 
studies,  epidemic  dynamics,  molecular-evolution  processes,  etc.).  According  to  Lauritzen  (1996), 
a  random  network  can  be  modeled  through  a  mathematical  graph  defined  as  G  =  (V,E),  which 
consists  of  a  finite  set  of  nodes  (or  vertices),  V,  and  a  set  of  edges,  E,  where  nodes  represent 
individuals  or  objects,  while  edges  specify  their  relationships. 

Graphs  can  be  further  divided  into  different  classes,  according  to  the  nature  of  their  edges  as  well 
as  the  paths  formed  by  edges.  Our  research  focuses  on  one  type  of  graph  called  a  chain  graph ,  made 
up  of  undirected  graphs  and  acyclic  directed  graphs  (ADGs  or  sometimes  abbreviated  as  DAGs). 
ADGs  consist  of  only  directed  edges  without  any  cycles,  and  thus  they  can  specify  direct  relations 
(e.g.,  conditional  dependencies,  causal  relations)  between  variables  defined  on  the  graph’s  nodes; 
see  Lauritzen  (1996),  Kolaczyk  (2009),  and  Koski  and  Noble  (2009)  for  further  details. 

In  recent  research  on  statistical-dependence  modeling,  Bayesian  networks  are  widely  used  to  char¬ 
acterize  joint  multivariate  probability  distributions,  which  can  define  properties  of  conditional  in¬ 
dependence  or  causal  relations  between  variables  in  a  complex  process.  In  the  research  conducted 
under  this  grant,  we  incorporate  spatial  and  spatio-temporal  dependencies  into  the  analysis  of 
network  data. 

An  explosion  of  ideas  has  been  generated  on  dependence  modeling  based  on  networks  (e.g.,  Friedman 
et  al.,  2000;  Ellis  and  Wong,  2008).  According  to  Koski  and  Noble  (2009),  a  Bayesian  network,  BN  = 
(G,p),  can  be  modeled  through  an  ADG,  G,  and  its  probability  distribution,  p.  The  Erdos-Renyi 
model  (E-R  model)  has  been  widely  used  in  the  past  to  capture  the  probability  distributions  of  ADGs 
(Erdos  and  Renyi,  1959).  This  model  belongs  to  the  family  of  exponential  random  graph  models 
(ERGM)  (e.g.,  Hunter  and  Handcock,  2005),  and  it  assumes  equal  and  independent  probabilities 
of  having  an  edge  between  any  pair  of  nodes  within  a  graph  (referred  as  “equal  and  independent 
assumptions”).  The  E-R  model  is  also  frequently  used  as  a  prior  distribution  for  ADGs  with  discrete 
data.  The  main  appeal  of  the  E-R  model  is  that  it  can  lead  to  a  closed-form  posterior  distribution 
(e.g.,  Ellis  and  Wong,  2008).  However,  in  reality,  the  equal  and  independent  assumptions  of  the 
E-R  model  are  not  realistic,  especially  for  high-dimensional  networks.  Furthermore,  its  sufficient 
statistic  captures  only  one  property  of  a  random  graph,  namely  the  number  of  edges;  all  the  other 
important  properties,  such  as  the  directions  of  edges,  the  patterns  formed  by  the  edges  are  ignored. 

In  what  follows,  we  consider  more  general  ADGs,  based  on  the  level-set  definition  proposed  by 


3 


Cressie  and  Davidson  (1998).  We  develop  a  sequential-modeling  strategy,  through  which  we  can 
capture  the  probability  distributions  of  ADGs,  but  we  avoid  strong  assumptions  such  as  the  equiva¬ 
lent  and  independent  assumptions.  Furthermore,  our  level-set  model  allows  more  graphical  informa¬ 
tion  to  be  used;  for  example,  we  consider  not  only  the  number  of  edges,  but  also  certain  structure 
of  the  ADG,  including  levels  of  the  ADG,  connections  between  levels  (definition  of  “levels”  and 
“connections”  will  be  given  later),  directions  of  edges  between  nodes,  etc.  Based  on  our  level-set 
modeling  strategy,  we  also  develop  an  algorithm  to  generate  ADGs  efficiently. 

We  introduce  the  following  notation 

•  G  denotes  an  ADG,  and  G  =  (V,  E). 

•  V  denotes  the  set  of  finite  nodes  in  G;  that  is,  V  =  {fi,  ...,un},  where  n  is  the  total  number 
of  nodes  and  n  is  given. 

•  E  denotes  the  set  of  directed  edges  in  G: 

E  =  {(u<,uy)  :  there  is  a  directed  edge  from  ut-  to  Vj;vt,Vj  G  V}.  (1) 

•  ch(v,)  denotes  the  children  of  node  that  is,  for  Vi  G  V, 

ch(vi)  =  {v3-  G  V  ;  (vi,Vj)  G  E}.  (2) 

•  pa(uj)  denotes  the  parents  of  node  u,-;  that  is,  for  v,  G  V, 

pa(u,-)  =  {vj  G  V  :  (vj,Vi)  G  E).  (3) 

•  Vmjn  denotes  the  set  of  vertices  with  no  parents;  that  is, 

vmin  =  {Vi  G  V  :  pa  (Vi)  =  0}.  (4) 

•  covr(B)  denotes  the  cover  of  a  subset  of  nodes  B  C  V,  which  is  the  subset  of  nodes  that  are 
not  in  B  but  whose  parents  are  all  in  B  (Cressie  and  Davidson,  1998);  that  is, 


covr(B)  =  {vi  G  V  :  pa(uj)  C  B  and  vt  £  B}. 


(5) 


Notice  that  the  definition  of  the  cover  of  a  subset  of  nodes  is  different  from  the  Markov  blanket 
(e.g.,  Pearl,  1988);  for  a  set  of  nodes,  the  Markov  blanket  consists  of  their  children,  their  parents,  as 
well  as  their  children’s  other  parents.  In  other  words,  the  Markov  blanket  contains  all  the  variables 
that  shield  the  subset  of  nodes  from  the  rest  of  the  network.  However,  covr(B)  only  includes  certain 
descendants:  covr(B)  C  ch(B). 

From  Cressie  and  Davidson  (1998),  an  ADG  with  a  finite  number  of  nodes  has  level  sets  L  = 
{L0,...,Ld},  formed  by  a  specific  partition  of  the  ADG  that  can  be  specified  recursively  as, 


V. 


if  i  =  0; 


covr(U{Lfc  :  k  =  0, ...,  i  —  1}),  if  0  <  i  <  d , 


(6) 


where  (d  +  1)  is  the  total  number  of  level  sets.  For  an  ADG  with  n  nodes,  it  is  straightforward  to 
see  that  1  <  d  +  1  <  n.  The  important  properties  of  level  sets  can  be  summarized  below  (Cressie 
and  Davidson,  1998): 
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1.  Every  node  of  am  ADG  should  belong  to  one  and  only  one  level  set;  specifically, 

Lj  fl  Lj  =  0  for  i  7^  j  =  0, d .  (7) 

In  other  words,  the  ( d  +  1)  level  sets,  L  =  {L0, . . . ,  } ,  together  form  a  (d  +  ^-nonempty- 

partition  of  the  ADG. 

2.  Every  node  in  a  non-minimal  level  set  should  have  at  least  one  parent  from  its  adjacent  level 
set  that  is  of  lower  order;  that  is,  for  any  v  G  Lj,  0  <  i  <  d,  then  there  exists  a  node  u  G  L,_! , 
such  that  u  G  pa(u). 

3.  The  directed  edges  can  only  go  from  nodes  in  lower-order  level  sets  to  nodes  in  higher-order 
level  sets;  that  is,  if  v,  u  G  V,  v  G  L,,  0  <  i  <  d,  and  u  G  pa(v),  then  u  G  U{Lfc  :  k  = 
0,...,i-l}. 

4.  The  nodes  in  the  same  level  set  should  be  independent;  that  is,  there  are  no  directed  edges 
within  any  level  set.  In  other  words,  if  v,  u  G  V  and  v,  u  G  L^,  i  =  0,  ...,d,  and  «^ti,  then 
there  should  be  no  directed  edge  between  v  and  u. 

5.  If  v  G  Lj,  0  <  i  <  d,  then  there  should  be  a  path  of  length  i  from  a  given  node  u  G  Lo  to  v. 
Furthermore,  no  path  to  v  can  be  longer  than  length  i. 

6.  The  maximum  length  of  a  path  in  an  ADG  with  (d  4-  1)  level  sets,  L  =  {Lo. ...,  Lj},  is  d. 

According  to  the  definition  and  properties  mentioned  above,  we  notice  that  different  ADGs  can 
give  rise  to  the  same  level-sets  structure;  however,  given  an  ADG,  the  level-sets  structure  should  be 
unique.  This  is  an  important  property  that  differentiates  the  notion  of  level  sets  from  other  modern 
graph-partition  strategies  (e.g.,  partitions  to  obtain  minimal  edges  among  partitions  but  maximal 
edges  within  partitions;  see  Newman,  2004).  Those  types  of  graph  partitions  are  not  unique  for  a 
given  ADG. 

Figure  1  shows  an  ADG  with  level-sets  structure  satisfying  all  the  properties  mentioned  above.  For 
example,  nodes  V\  and  ly  are  in  the  minimal  level  set  Lo,  because  they  have  no  parents;  there 
is  no  directed  edge  within  each  level  set;  directed  edges  always  go  from  lower-order  level  sets  to 
higher-order  level  sets,  and  so  forth. 

From  the  definitions  and  properties  of  level  sets  given  above,  we  can  see  that  the  level-sets  structure 
of  an  ADG  involves  much  more  graphical  information  than  just  the  number  of  edges  found  in  the 
E-R  model.  In  order  to  specify  the  structure  between  level-sets,  we  introduce  the  connection  matrix, 
but  we  first  need  to  define  the  adjacency  matrix. 

From  Lauritzen  (1996),  we  can  use  an  adjacency  matrix ,  Y  =  [y,j]nxn,  to  uniquely  specify  the 
structure  of  an  ADG  with  n  nodes: 

_  J  1,  if  there  is  a  directed  edge  from  Vi  to  Vj,  where  Vi,Vj  G  V;  ,  . 

y*j  —  {  o,  otherwise. 

Similarly,  we  can  define  a  connection  matrix,  C  =  [c*i]  (<j-f-i)x(tj+i)  i  t°  specify  the  structure  between 
level  sets.  Consider  a  given  ADG  with  (d  +  1)  level  sets,  L  =  {Lo, ...,  Ld}.  If  there  is  at  least 
one  directed  edge  going  from  one  of  the  nodes  in  level  set  L,-  to  one  of  the  nodes  in  level  set  Lj, 
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Figure  1:  An  ADG  with  7  nodes  and  4  level  sets 


*  <  h  then  we  say  that  there  is  a  directed  connection  going  from  L,  to  L j.  Otherwise,  if  there 
is  no  directed  edge  between  nodes  in  two  different  level  sets  within  an  ADG,  we  say  that  there 
is  no  connection  between  the  two  level  sets.  Thus,  we  define  the  connection  matrix  of  level  sets, 
L  =  {L0,...,L4,  as  a  (d  4- 1)  x  (d  +  1)  matrix,  C  =  [cfci](d+i)X(d+i),  as  follows: 


cfc+i,j+i 


f  1,  if  there  is  a  directed  connection  from  Lk  to  L*,  where  Lk,  Li  £  L; 
(  0,  otherwise. 


(9) 


For  example,  the  connection  matrix  C  of  the  ADG  with  seven  nodes  and  four  level  sets  in  Figure 
1  can  be  written  as, 


( 0 

1 

1 

1\ 

0 

0 

1 

0 

0 

0 

0 

1 

^0 

0 

0 

0  ) 

(10) 


Now  we  shall  discuss  modeling  strategies  for  ADGs.  As  mentioned  before,  the  ERGM  family  is 
popular  for  modeling  ADGs.  A  typical  ERGM  defines  the  probability  of  an  ADG  as  (Hunter  and 
Handcock,  2005): 

p(G|e)  -  exp|®Qg)(G)l,  (ii) 

where  ©  is  a  vector  of  parameters;  g(G)  is  a  vector  of  graph  statistics  that  is  sufficent  for  (11); 
and  c(0)  is  the  normalizing  constant.  For  example,  the  E-R  model  is  a  specific  case  of  an  ERGM 
defined  as: 

P(G|0)  <x  exp[— 0|G|] ,  (12) 

where  |G|  is  the  number  of  edges  in  G,  0  >  0,  and  ee  is  interpreted  as  the  probability  of  having 
an  edge  between  any  pair  of  nodes  in  the  ADG  G.  Therefore,  the  E-R  model  implies  that  the 
probability  of  having  an  edge  between  each  pair  of  nodes  is  equal  and  independent  within  a  graph. 
Although  the  ERGM  family  allows  inclusion  of  other  graphical  structures,  research  on  what  type  of 
graphical  statistics  can  be  used  in  the  ERGM  is  still  in  its  infancy.  Furthermore,  the  ERGM  family 
is  difficult  to  apply  to  high-dimensional  networks. 
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We  shall  propose  level-sets  models  below  to  avoid  these  limitations  of  ERGMs.  Rather  than  directly 
modeling  the  joint  probability  distribution  of  every  individual  node  in  the  graph,  our  strategy  is 
to  first  model  the  probability  distribution  of  the  unique  level-sets  structure  of  an  ADG;  then,  we 
model  the  probability  distributions  of  ADGs  conditional  on  its  level-sets  structure.  This  uses  infor¬ 
mation  on  the  children’s  and  parents’  directed  edges  contained  in  the  level-set  structure.  Also,  this 
conditional-probability  modeling  strategy  helps  us  avoid  the  strong  assumptions  made  in  defining 
ERGMs. 

We  define  the  level-sets  model  as  follows: 

P(G|0)  =  P(G|C,  ©)P(C|L,  ©)P(L|V,  ©)P(V|0)  (13) 

Since  the  adjacency  matrix  Y  and  the  ADG  G  are  in  one-to-one  correspondence,  we  also  can  write 
equation  (13)  as 

P(Y|0)  =  P(Y|C,  ©)P(C|L,  ©)P(L|V,  ©)P(V|©)  (14) 

where,  ©  is  a  vector  of  parameters  (e.g.,  Ellis  and  Wong,  2008). 

Based  on  the  level-sets  model  (13),  we  develop  an  associated  algorithm  that  can  efficiently  generate 
ADGs.  Compared  to  the  E-R  model,  our  level-sets  model  is  appealing  as  a  flexible  prior  distribution 
for  Bayesian  inference  on  ADGs.  Zhuang  and  Cressie  (2011)  show  how  this  algorithm  can  be  used 
in  Bayesian  inference  for  multivariate  distributions  defined  by  ADGs  and  eventually  chain  graphs. 
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